Suffix arrays with a twist

نویسندگان

  • Tomasz Marek Kowalski
  • Szymon Grabowski
  • Kimmo Fredriksson
  • Marcin Raniszewski
چکیده

The suffix array is a classic full-text index, combining effectiveness with simplicity. We discuss three approaches aiming to improve its efficiency even more: changes to the navigation, data layout and adding extra data. In short, we show that (i) how we search for the right interval boundary impacts significantly the overall search speed, (ii) a B-tree data layout easily wins over the standard one, (iii) the well-known idea of a lookup table for the prefixes of the suffixes can be refined with using compression, (iv) caching prefixes of the suffixes in a helper array can pose a(nother) practical space-time tradeoff.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the combinatorics of suffix arrays

We prove several combinatorial properties of suffix arrays, including a characterization of suffix arrays through a bijection with a certain well-defined class of permutations. Our approach is based on the characterization of Burrows-Wheeler arrays given in [1], that we apply by reducing suffix sorting to cyclic shift sorting through the use of an additional sentinel symbol. We show that the ch...

متن کامل

Computing suffix links for suffix trees and arrays

We present a new and simple algorithm to reconstruct suffix links in suffix trees and suffix arrays. The algorithm is based on observations regarding suffix tree construction algorithms. With our algorithm we bring suffix arrays even closer to the ease of use and implementation of suffix trees.

متن کامل

Suffix arrays: what are they good for?

Recently the theoretical community has displayed a flurry of interest in suffix arrays, and compressed suffix arrays. New, asymptotically optimal algorithms for construction, search, and compression of suffix arrays have been proposed. In this talk we will present our investigations into the practicalities of these latest developments. In particular, we investigate whether suffix arrays can ind...

متن کامل

Counting Suffix Arrays and Strings

Suffix arrays are used in various application and research areas like data compression or computational biology. In this work, our goal is to characterize the combinatorial properties of suffix arrays and their enumeration. For fixed alphabet size and string length we count the number of strings sharing the same suffix array and the number of such suffix arrays. Our methods have applications to...

متن کامل

The Virtual Suffix Tree: An Efficient Data Structure for Suffix Trees and Suffix Arrays

We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees and suffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derive the virtual suffix tree. The VST provides the same functionality as the suffix tree, including suffix links, but at a much smaller space requirement. It has the same linear time construction even for large ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1607.08176  شماره 

صفحات  -

تاریخ انتشار 2016